Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1094] Optimize mechanism of ChunkManager expired shuffle key cleanup to avoid memory leak #2053

Closed
wants to merge 2 commits into from

Conversation

SteNicholas
Copy link
Member

@SteNicholas SteNicholas commented Oct 30, 2023

What changes were proposed in this pull request?

The cleaner of Worker executes the StorageManager#cleanupExpiredShuffleKey to clean expired shuffle keys with daemon cached thread pool. The optimization speeds up cleaning including expired shuffle keys of ChunkManager to avoid memory leak.

Why are the changes needed?

ChunkManager#streams could lead memory leak when the speed of cleanup is slower than expiration for expired shuffle of worker. The behavior that ChunkStreamManager cleanup expired shuffle key should be optimized to avoid memory leak, which causes that the VM thread of worker is 100%.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

WorkerSuite#clean up.

@codecov
Copy link

codecov bot commented Oct 30, 2023

Codecov Report

Merging #2053 (115e27d) into main (cd8acf8) will increase coverage by 0.08%.
Report is 3 commits behind head on main.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##             main    #2053      +/-   ##
==========================================
+ Coverage   46.68%   46.76%   +0.08%     
==========================================
  Files         165      165              
  Lines       10549    10577      +28     
  Branches      961      961              
==========================================
+ Hits         4924     4945      +21     
- Misses       5303     5310       +7     
  Partials      322      322              
Files Coverage Δ
...cala/org/apache/celeborn/common/CelebornConf.scala 87.72% <87.50%> (+0.03%) ⬆️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@SteNicholas SteNicholas force-pushed the CELEBORN-1094 branch 8 times, most recently from c865e2e to a39ad7f Compare November 1, 2023 02:59
@SteNicholas
Copy link
Member Author

@waitinfuture, @FMX, could you help to review this pull requst? Otherwise I should always resolve the conflicts.

Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. The default value I think can be changed to avoid too many threads.

Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. Merged into main.

@FMX FMX closed this in 4e8e8c2 Nov 2, 2023
FMX pushed a commit that referenced this pull request Nov 2, 2023
…y cleanup to avoid memory leak

### What changes were proposed in this pull request?

The `cleaner` of `Worker` executes the `StorageManager#cleanupExpiredShuffleKey` to clean expired shuffle keys with daemon cached thread pool. The optimization speeds up cleaning including expired shuffle keys of ChunkManager to avoid memory leak.

### Why are the changes needed?

`ChunkManager#streams` could lead memory leak when the speed of cleanup is slower than expiration for expired shuffle of worker. The behavior that `ChunkStreamManager` cleanup expired shuffle key should be optimized to avoid memory leak, which causes that the VM thread of worker is 100%.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`WorkerSuite#clean up`.

Closes #2053 from SteNicholas/CELEBORN-1094.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
(cherry picked from commit 4e8e8c2)
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants